On the Power of Global Reward Signals in Reinforcement Learning

Kemmerich, Thomas; Kleine Büning, Hans

doi:10.1007/978-3-642-24603-6_7

Thomas Kemmerich²¹ &
Hans Kleine Büning²²

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 6973))

Included in the following conference series:

German Conference on Multiagent System Technologies

670 Accesses

Abstract

Reinforcement learning is investigated in various models, involving single and multiagent settings as well as fully or partially observable domains. Although such models differ in several aspects, their basic approach is identical: agents obtain a state observation and a global reward signal from an environment and execute actions which in turn influence the environment state. In this work, we discuss the role of such global reward signals. We present a concept that does not provide a visible environment state but only offers a numerical engineered reward. It will be proven that this approach has the same computational complexity and expressive power as ordinary fully observable models, but allows to infringe assumptions in models with partial observability. To avoid such infringements, we then argue that rewards, besides a true reward value, shall never contain additional polynomial time decodable information.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 54.99; Price excludes VAT (USA)

Softcover Book: USD 69.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Bagnell, J.A., Ng, A.Y.: On local rewards and scaling distributed reinforcement learning. In: Advances in Neural Information Processing Systems, NIPS 2005 (2005)
Google Scholar
Bernstein, D.S., Givan, R., Immerman, N., Zilberstein, S.: The complexity of decentralized control of markov decision processes. Math. Oper. Res. 27, 819–840 (2002)
Article MathSciNet MATH Google Scholar
Bernstein, D.S., Hansen, E.A., Zilberstein, S.: Dynamic programming for partially observable stochastic games. In: AAAI, pp. 709–715. AAAI Press / The MIT Press (2004)
Google Scholar
Buşoniu, L., Babuška, R., De Schutter, B.: Multi-agent reinforcement learning: An overview. In: Srinivasan, D., Jain, L.C. (eds.) Innovations in Multi-Agent Systems and Applications - 1. SCI, vol. 310, pp. 183–221. Springer, Heidelberg (2010)
Chapter Google Scholar
Chang, Y.H., Ho, T., Kaelbling, L.P.: All learning is local: Multi-agent learning in global reward games. In: Thrun, S., Saul, L.K., Schölkopf, B. (eds.) NIPS. MIT Press, Cambridge (2003)
Google Scholar
Cormen, T.H., Leiserson, C.E., Rivest, R.L., Stein, C.: Introduction to Algorithms, 2nd edn. MIT Press, Cambridge (2001)
MATH Google Scholar
Devlin, S., Kudenko, D.: Theoretical considerations of potential-based reward shaping for multi-agent systems. In: Proc. of 10th Intl. Conf. on Autonomous Agents and Multiagent Systems (AAMAS 2011), pp. 225–232 (2011)
Google Scholar
Garey, M.R., Johnson, D.S.: Computers and Intractability: A Guide to the Theory of NP-Completeness. W. H. Freeman and Company, New York (1979)
MATH Google Scholar
Kaelbling, L.P., Littman, M.L., Cassandra, A.R.: Planning and acting in partially observable stochastic domains. J. Artif. Intell. Res. 101(1-2), 99–134 (1998)
Article MathSciNet MATH Google Scholar
Kaelbling, L.P., Littman, M.L., Moore, A.P.: Reinforcement learning: A survey. J. Artif. Intell. Res. 4, 237–285 (1996)
Google Scholar
Kemmerich, T., Kleine Büning, H.: A convergent multiagent reinforcement learning approach for a subclass of cooperative stochastic games. In: Proc. of the Adaptive Learning Agents Workshop @ AAMAS 2011, pp. 75–82 (2011)
Google Scholar
Kemmerich, T., Kleine Büning, H.: Region-based heuristics for an iterative partitioning problem in multiagent systems. In: Proc. 3rd Intl. Conf. on Agents and Artificial Intelligence (ICAART 2011), vol. 2, pp. 200–205. SciTePress (2011)
Google Scholar
Melo, F.S., Ribeiro, I.: Transition entropy in partially observable markov decision processes. In: Arai, T., Pfeifer, R., Balch, T.R., Yokoi, H. (eds.) IAS, pp. 282–289. IOS Press, Amsterdam (2006)
Google Scholar
Mitchell, T.M.: Machine Learning. McGraw-Hill, New York (1997)
MATH Google Scholar
Ng, A.Y., Harada, D., Russell, S.J.: Policy invariance under reward transformations: Theory and application to reward shaping. In: Bratko, I., Dzeroski, S. (eds.) ICML, pp. 278–287. Morgan Kaufmann, San Francisco (1999)
Google Scholar
Oliehoek, F.A., Spaan, M.T.J., Vlassis, N.A.: Optimal and approximate Q-value functions for decentralized POMDPs. J. Artif. Intell. Res. 32, 289–353 (2008)
MathSciNet MATH Google Scholar
Seuken, S., Zilberstein, S.: Formal models and algorithms for decentralized decision making under uncertainty. Autonomous Agents and Multi-Agent Systems 17(2), 190–250 (2008)
Article Google Scholar
Stone, P., Sutton, R.S., Kuhlmann, G.: Reinforcement learning for robocup-soccer keepaway. Adaptive Behavior 13(3), 165–188 (2005)
Article Google Scholar
Sutton, R.S., Barto, A.G.: Reinforcement Learning: An Introduction. The MIT Press, Cambridge (1998)
Google Scholar

Download references

Author information

Authors and Affiliations

International Graduate School Dynamic Intelligent Systems, University of Paderborn, 33095, Paderborn, Germany
Thomas Kemmerich
Department of Computer Science, University of Paderborn, 33095, Paderborn, Germany
Hans Kleine Büning

Authors

Thomas Kemmerich
View author publications
You can also search for this author in PubMed Google Scholar
Hans Kleine Büning
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Modeling and Simulation Research Center, Örebro University, 70182, Örebro, Sweden
Franziska Klügl
University Rey Juan Carlos, ETSII / CETINIA, Calle Tulipán s/n, 28933, Móstoles (Madrid), Spain
Sascha Ossowski

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Kemmerich, T., Kleine Büning, H. (2011). On the Power of Global Reward Signals in Reinforcement Learning. In: Klügl, F., Ossowski, S. (eds) Multiagent System Technologies. MATES 2011. Lecture Notes in Computer Science(), vol 6973. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-24603-6_7

Download citation

DOI: https://doi.org/10.1007/978-3-642-24603-6_7
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-24602-9
Online ISBN: 978-3-642-24603-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics