Skip to main content

On the Power of Global Reward Signals in Reinforcement Learning

  • Conference paper
Book cover Multiagent System Technologies (MATES 2011)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 6973))

Included in the following conference series:

  • 670 Accesses

Abstract

Reinforcement learning is investigated in various models, involving single and multiagent settings as well as fully or partially observable domains. Although such models differ in several aspects, their basic approach is identical: agents obtain a state observation and a global reward signal from an environment and execute actions which in turn influence the environment state. In this work, we discuss the role of such global reward signals. We present a concept that does not provide a visible environment state but only offers a numerical engineered reward. It will be proven that this approach has the same computational complexity and expressive power as ordinary fully observable models, but allows to infringe assumptions in models with partial observability. To avoid such infringements, we then argue that rewards, besides a true reward value, shall never contain additional polynomial time decodable information.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 54.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 69.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Bagnell, J.A., Ng, A.Y.: On local rewards and scaling distributed reinforcement learning. In: Advances in Neural Information Processing Systems, NIPS 2005 (2005)

    Google Scholar 

  2. Bernstein, D.S., Givan, R., Immerman, N., Zilberstein, S.: The complexity of decentralized control of markov decision processes. Math. Oper. Res. 27, 819–840 (2002)

    Article  MathSciNet  MATH  Google Scholar 

  3. Bernstein, D.S., Hansen, E.A., Zilberstein, S.: Dynamic programming for partially observable stochastic games. In: AAAI, pp. 709–715. AAAI Press / The MIT Press (2004)

    Google Scholar 

  4. Buşoniu, L., Babuška, R., De Schutter, B.: Multi-agent reinforcement learning: An overview. In: Srinivasan, D., Jain, L.C. (eds.) Innovations in Multi-Agent Systems and Applications - 1. SCI, vol. 310, pp. 183–221. Springer, Heidelberg (2010)

    Chapter  Google Scholar 

  5. Chang, Y.H., Ho, T., Kaelbling, L.P.: All learning is local: Multi-agent learning in global reward games. In: Thrun, S., Saul, L.K., Schölkopf, B. (eds.) NIPS. MIT Press, Cambridge (2003)

    Google Scholar 

  6. Cormen, T.H., Leiserson, C.E., Rivest, R.L., Stein, C.: Introduction to Algorithms, 2nd edn. MIT Press, Cambridge (2001)

    MATH  Google Scholar 

  7. Devlin, S., Kudenko, D.: Theoretical considerations of potential-based reward shaping for multi-agent systems. In: Proc. of 10th Intl. Conf. on Autonomous Agents and Multiagent Systems (AAMAS 2011), pp. 225–232 (2011)

    Google Scholar 

  8. Garey, M.R., Johnson, D.S.: Computers and Intractability: A Guide to the Theory of NP-Completeness. W. H. Freeman and Company, New York (1979)

    MATH  Google Scholar 

  9. Kaelbling, L.P., Littman, M.L., Cassandra, A.R.: Planning and acting in partially observable stochastic domains. J. Artif. Intell. Res. 101(1-2), 99–134 (1998)

    Article  MathSciNet  MATH  Google Scholar 

  10. Kaelbling, L.P., Littman, M.L., Moore, A.P.: Reinforcement learning: A survey. J. Artif. Intell. Res. 4, 237–285 (1996)

    Google Scholar 

  11. Kemmerich, T., Kleine Büning, H.: A convergent multiagent reinforcement learning approach for a subclass of cooperative stochastic games. In: Proc. of the Adaptive Learning Agents Workshop @ AAMAS 2011, pp. 75–82 (2011)

    Google Scholar 

  12. Kemmerich, T., Kleine Büning, H.: Region-based heuristics for an iterative partitioning problem in multiagent systems. In: Proc. 3rd Intl. Conf. on Agents and Artificial Intelligence (ICAART 2011), vol. 2, pp. 200–205. SciTePress (2011)

    Google Scholar 

  13. Melo, F.S., Ribeiro, I.: Transition entropy in partially observable markov decision processes. In: Arai, T., Pfeifer, R., Balch, T.R., Yokoi, H. (eds.) IAS, pp. 282–289. IOS Press, Amsterdam (2006)

    Google Scholar 

  14. Mitchell, T.M.: Machine Learning. McGraw-Hill, New York (1997)

    MATH  Google Scholar 

  15. Ng, A.Y., Harada, D., Russell, S.J.: Policy invariance under reward transformations: Theory and application to reward shaping. In: Bratko, I., Dzeroski, S. (eds.) ICML, pp. 278–287. Morgan Kaufmann, San Francisco (1999)

    Google Scholar 

  16. Oliehoek, F.A., Spaan, M.T.J., Vlassis, N.A.: Optimal and approximate Q-value functions for decentralized POMDPs. J. Artif. Intell. Res. 32, 289–353 (2008)

    MathSciNet  MATH  Google Scholar 

  17. Seuken, S., Zilberstein, S.: Formal models and algorithms for decentralized decision making under uncertainty. Autonomous Agents and Multi-Agent Systems 17(2), 190–250 (2008)

    Article  Google Scholar 

  18. Stone, P., Sutton, R.S., Kuhlmann, G.: Reinforcement learning for robocup-soccer keepaway. Adaptive Behavior 13(3), 165–188 (2005)

    Article  Google Scholar 

  19. Sutton, R.S., Barto, A.G.: Reinforcement Learning: An Introduction. The MIT Press, Cambridge (1998)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2011 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Kemmerich, T., Kleine Büning, H. (2011). On the Power of Global Reward Signals in Reinforcement Learning. In: Klügl, F., Ossowski, S. (eds) Multiagent System Technologies. MATES 2011. Lecture Notes in Computer Science(), vol 6973. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-24603-6_7

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-24603-6_7

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-24602-9

  • Online ISBN: 978-3-642-24603-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics