Abstract
Reinforcement learning is investigated in various models, involving single and multiagent settings as well as fully or partially observable domains. Although such models differ in several aspects, their basic approach is identical: agents obtain a state observation and a global reward signal from an environment and execute actions which in turn influence the environment state. In this work, we discuss the role of such global reward signals. We present a concept that does not provide a visible environment state but only offers a numerical engineered reward. It will be proven that this approach has the same computational complexity and expressive power as ordinary fully observable models, but allows to infringe assumptions in models with partial observability. To avoid such infringements, we then argue that rewards, besides a true reward value, shall never contain additional polynomial time decodable information.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Bagnell, J.A., Ng, A.Y.: On local rewards and scaling distributed reinforcement learning. In: Advances in Neural Information Processing Systems, NIPS 2005 (2005)
Bernstein, D.S., Givan, R., Immerman, N., Zilberstein, S.: The complexity of decentralized control of markov decision processes. Math. Oper. Res. 27, 819–840 (2002)
Bernstein, D.S., Hansen, E.A., Zilberstein, S.: Dynamic programming for partially observable stochastic games. In: AAAI, pp. 709–715. AAAI Press / The MIT Press (2004)
Buşoniu, L., Babuška, R., De Schutter, B.: Multi-agent reinforcement learning: An overview. In: Srinivasan, D., Jain, L.C. (eds.) Innovations in Multi-Agent Systems and Applications - 1. SCI, vol. 310, pp. 183–221. Springer, Heidelberg (2010)
Chang, Y.H., Ho, T., Kaelbling, L.P.: All learning is local: Multi-agent learning in global reward games. In: Thrun, S., Saul, L.K., Schölkopf, B. (eds.) NIPS. MIT Press, Cambridge (2003)
Cormen, T.H., Leiserson, C.E., Rivest, R.L., Stein, C.: Introduction to Algorithms, 2nd edn. MIT Press, Cambridge (2001)
Devlin, S., Kudenko, D.: Theoretical considerations of potential-based reward shaping for multi-agent systems. In: Proc. of 10th Intl. Conf. on Autonomous Agents and Multiagent Systems (AAMAS 2011), pp. 225–232 (2011)
Garey, M.R., Johnson, D.S.: Computers and Intractability: A Guide to the Theory of NP-Completeness. W. H. Freeman and Company, New York (1979)
Kaelbling, L.P., Littman, M.L., Cassandra, A.R.: Planning and acting in partially observable stochastic domains. J. Artif. Intell. Res. 101(1-2), 99–134 (1998)
Kaelbling, L.P., Littman, M.L., Moore, A.P.: Reinforcement learning: A survey. J. Artif. Intell. Res. 4, 237–285 (1996)
Kemmerich, T., Kleine Büning, H.: A convergent multiagent reinforcement learning approach for a subclass of cooperative stochastic games. In: Proc. of the Adaptive Learning Agents Workshop @ AAMAS 2011, pp. 75–82 (2011)
Kemmerich, T., Kleine Büning, H.: Region-based heuristics for an iterative partitioning problem in multiagent systems. In: Proc. 3rd Intl. Conf. on Agents and Artificial Intelligence (ICAART 2011), vol. 2, pp. 200–205. SciTePress (2011)
Melo, F.S., Ribeiro, I.: Transition entropy in partially observable markov decision processes. In: Arai, T., Pfeifer, R., Balch, T.R., Yokoi, H. (eds.) IAS, pp. 282–289. IOS Press, Amsterdam (2006)
Mitchell, T.M.: Machine Learning. McGraw-Hill, New York (1997)
Ng, A.Y., Harada, D., Russell, S.J.: Policy invariance under reward transformations: Theory and application to reward shaping. In: Bratko, I., Dzeroski, S. (eds.) ICML, pp. 278–287. Morgan Kaufmann, San Francisco (1999)
Oliehoek, F.A., Spaan, M.T.J., Vlassis, N.A.: Optimal and approximate Q-value functions for decentralized POMDPs. J. Artif. Intell. Res. 32, 289–353 (2008)
Seuken, S., Zilberstein, S.: Formal models and algorithms for decentralized decision making under uncertainty. Autonomous Agents and Multi-Agent Systems 17(2), 190–250 (2008)
Stone, P., Sutton, R.S., Kuhlmann, G.: Reinforcement learning for robocup-soccer keepaway. Adaptive Behavior 13(3), 165–188 (2005)
Sutton, R.S., Barto, A.G.: Reinforcement Learning: An Introduction. The MIT Press, Cambridge (1998)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2011 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Kemmerich, T., Kleine Büning, H. (2011). On the Power of Global Reward Signals in Reinforcement Learning. In: Klügl, F., Ossowski, S. (eds) Multiagent System Technologies. MATES 2011. Lecture Notes in Computer Science(), vol 6973. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-24603-6_7
Download citation
DOI: https://doi.org/10.1007/978-3-642-24603-6_7
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-24602-9
Online ISBN: 978-3-642-24603-6
eBook Packages: Computer ScienceComputer Science (R0)