Adaptive Strategies and Regret Minimization in Arbitrarily Varying Markov Environments

Mannor, Shie; Shimkin, Nahum

doi:10.1007/3-540-44581-1_9

Shie Mannor³ &
Nahum Shimkin³

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 2111))

Included in the following conference series:

International Conference on Computational Learning Theory

2011 Accesses
1 Citations

Abstract

We consider the problem of maximizing the average reward in a controlled Markov environment, which also contains some arbitrarily varying elements. This problem is captured by a two-person stochastic game model involving the reward maximizing agent and a second player, which is free to use an arbitrary (non-stationary and unpredictable) control strategy. While the minimax value of the associated zero-sum game provides a guaranteed performance level, the fact that the second player’s behavior is observed as the game unfolds opens up the opportunity to improve upon this minimax value if the second player is not playing a worst-case strategy. This basic idea has been formalized in the context of repeated matrix games by the classical notions of regret minimization with respect to the Bayes envelope, where an attainable performance goal is defined in terms of the empirical frequencies of the opponent’s actions. This paper presents an extension of these ideas to problems with Markovian dynamics, under appropriate recurrence conditions. The Bayes envelope is first defined in a natural way in terms of the observed state action frequencies. As this envelope may not be attained in general, we define a proper convexification thereof as an attainable solution concept. In the specific case of single-controller games, where the opponent alone controls the state transitions, the Bayes envelope itself turns out to be convex and attainable. Some concrete examples are shown to fit in this framework.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Special issue on learning in games. Games and Economic Behavior, 29(1), November 1999.
Google Scholar
P. Auer, N. Cesa-Bianchi, Y. Freund, and R.E. Schapire. Gambling in a rigged casino: The adversarial multi armed bandit problem. In Proc. 36th Annual Symposium on Foundations of Computer Science, pages 322–331. IEEE Computer Society Press, 1995.
Google Scholar
D.P. Bertsekas and J.N. Tsitsiklis. Neuro-Dynamic Programming. Athena Scientific, 1995.
Google Scholar
D. Blackwell. An analog of the minimax theorem for vector payoffs. Pacific J. Math., 6(1):1–8, 1956.
MATH MathSciNet Google Scholar
D. Blackwell. Controlled random walks. In Proc. International Congress of Mathematicians, 1954, volume III, pages 336–338. North-Holland, 1956.
Google Scholar
J. Filar and K. Vrieze. Competitive Markov Decision Processes. Springer Verlag, 1996.
Google Scholar
Y. Freund and R. Schapire. Adaptive game playing using multiplicative weights. Games and Economic Behavior, 29:79–103, November 1999.
Article MATH MathSciNet Google Scholar
D. Fudenberg and D. Levine. Universal consistency and cautious fictitious play. Journal of Economic Dynamic and Control, 19:1065–1990, 1995.
Article MATH MathSciNet Google Scholar
J. Hannan. Approximation to bayes risk in repeated play. In M. Dresher, A.W. Tucker, and P. Wolde, editors, Contribution to The Theory of Games, III, pages 97–139. Princeton University Press, 1957.
Google Scholar
S. Hart and A. Mas-Colell. A simple adaptive procedure leading to correlated equilibrium. DP 166, The Hebrew University of Jerusalem, Center for Rationality, 1998.
Google Scholar
E. Lehrer. Approachability in infinite dimensional spaces and an application: A universal algorithm for generating extended normal numbers. Preprint, May 1998.
Google Scholar
M.L. Littman. Markov games as a framework for multi-agent reinforcement learning. In Morgan Kaufman, editor, Eleventh International Conference on Machine Learning, pages 157–163, 1994.
Google Scholar
S. Mannor and N. Shimkin. The empirical bayes envelope approach to regret minimization in stochastic games. Technical report EE-1262, Faculty of Electrical Engineering, Technion, Israel, October 2000. available from: http://tiger.technion.ac.il/~shie/Public/drmOct23techreport.ps.gz.
S. Mannor and N. Shimkin. Regret minimization in signal space for repeated matrix games with partial observations. Technical report EE-1242, Faculty of Electrical Engineering, Technion, Israel, March 2000. available from: http://tiger.technion.ac.il/~shie/Public/beMar16.ps.gz.
T. Parthasarathy and M. Stern. Markov games-a survey. Differential Games and Control Theory, 1977.
Google Scholar
S.D. Patek. Stochastic Shortest Path Games. PhD thesis, LIDS MIT, January 1997.
Google Scholar
M. Puterman. Markov Decision Processes. Wiley-Interscience, 1994.
Google Scholar
E. Rasmunsen. Games and Information: An Introduction to Game Theory. Blackwell, 1994.
Google Scholar
A. Rustichini. Minimizing regret: the general case. Games and Economic Behavior, 29:224–243, November 1999.
Article MATH MathSciNet Google Scholar
N. Shimkin and A. Shwartz. Guaranteed performance regions in markovian systems with competing decision makers. IEEE Trans. on Automatic Control, 38(1):84–95, January 1993.
Article MATH MathSciNet Google Scholar
X. Spiant. An approachability condition for general sets. Technical Report 496, Ecole Polytechnique, Paris, 1999.
Google Scholar
V. Vovk. A game of prediction with experts advice. Journal of Computer and Systems Sciences, 56(2):153–173, April 1998.
Article MATH MathSciNet Google Scholar

Download references

Author information

Authors and Affiliations

Department of Electrical Engineering, Technion, Haifa, 32000, Israel
Shie Mannor & Nahum Shimkin

Authors

Shie Mannor
View author publications
You can also search for this author in PubMed Google Scholar
Nahum Shimkin
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

School of Engineering, Department of Computer Science, University of California, Santa Cruz, Santa Cruz, CA, 95064, USA
David Helmbold
Research School of Information Sciences and Engineering Department of Telecommunications Engineering, Australian National University, Canberra, 0200, Australia
Bob Williamson

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Mannor, S., Shimkin, N. (2001). Adaptive Strategies and Regret Minimization in Arbitrarily Varying Markov Environments. In: Helmbold, D., Williamson, B. (eds) Computational Learning Theory. COLT 2001. Lecture Notes in Computer Science(), vol 2111. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-44581-1_9

Download citation

DOI: https://doi.org/10.1007/3-540-44581-1_9
Published: 13 September 2001
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-42343-0
Online ISBN: 978-3-540-44581-4
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics