Skip to main content

A Gentle Introduction to Reinforcement Learning

  • Conference paper
  • First Online:
Scalable Uncertainty Management (SUM 2016)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 9858))

Included in the following conference series:

Abstract

This paper provides a gentle introduction to some of the basics of reinforcement learning, as well as pointers to more advanced topics within the field.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    Do not dismiss these results as only academically interesting due to the ‘game’ nature of the problems: the complexity of these problems approaches and surpasses that of many more useful applications [28]. Furthermore, the past has shown that breakthrough advances in games have lead to breakthroughs in other fields. Monte Carlo Tree Search, initially developed for Go, is one example [8].

  2. 2.

    There may be many, although their (action-)value functions will all be the same.

  3. 3.

    Not to speak of continuous action spaces. That is not considered in this paper.

  4. 4.

    Dyna-Q combines Q-learning with learning a transition model. This (approximate) model is then used generated simulated samples for the Q-learner. Real life sample and simulated samples can be arbitrarily inter-twined. This principled is also referred to as planning in an RL context.

  5. 5.

    To appear, will be available online at ai.vub.ac.be.

References

  1. Albus, J.S.: Brains, Behavior, and Robotics. Byte Books, Peterborough (1981)

    Google Scholar 

  2. Amazon: Amazon prime air (2016). http://www.amazon.com/b?node=8037720011. Accessed 20 Apr 2016

  3. Barrett, L., Narayanan, S.: Learning all optimal policies with multiple criteria. In: Proceedings of the 25th International Conference on Machine Learning, pp. 41–47. ACM (2008)

    Google Scholar 

  4. Barto, A.G., Sutton, R.S., Anderson, C.W.: Neuronlike adaptive elements that can solve difficult learning control problems. IEEE Trans. Syst. Man Cybern. 5, 834–846 (1983)

    Article  Google Scholar 

  5. Bertsekas, D.P.: Dynamic Programming and Optimal Control, vol. 1. Athena Scientific, Belmont (1995)

    MATH  Google Scholar 

  6. Bhattacharya, R., Waymire, E.C.: A Basic Course in Probability Theory. Springer, New York (2007)

    MATH  Google Scholar 

  7. Bloembergen, D., Tuyls, K., Hennes, D., Kaisers, M.: Evolutionary dynamics of multi-agent learning: a survey. J. Artif. Intell. Res. 53, 659–697 (2015)

    MathSciNet  MATH  Google Scholar 

  8. Browne, C.B., Powley, E., Whitehouse, D., Lucas, S.M., Cowling, P.I., Rohlfshagen, P., Tavener, S., Perez, D., Samothrakis, S., Colton, S.: A survey of monte carlo tree search methods. IEEE Trans. Comput. Intell. AI Games 4(1), 1–43 (2012)

    Article  Google Scholar 

  9. Brys, T., Harutyunyan, A., Suay, H.B., Chernova, S., Taylor, M.E., Nowé, A.: Reinforcement learning from demonstration through shaping. In: Proceedings of the International Joint Conference on Artificial Intelligence (IJCAI), pp. 3352–3358 (2015)

    Google Scholar 

  10. Das, I., Dennis, J.E.: A closer look at drawbacks of minimizing weighted sums of objectives for Pareto set generation in multicriteria optimization problems. Struct. Optim. 14(1), 63–69 (1997)

    Article  Google Scholar 

  11. Devlin, S., Kudenko, D.: Dynamic potential-based reward shaping. In: Proceedings of the 11th International Conference on Autonomous Agents and Multiagent Systems, vol. 1, pp. 433–440. International Foundation for Autonomous Agents and Multiagent Systems (2012)

    Google Scholar 

  12. Devlin, S., Kudenko, D., Grześ, M.: An empirical study of potential-based reward shaping and advice in complex, multi-agent systems. Adv. Complex Syst. 14(02), 251–278 (2011)

    Article  MathSciNet  Google Scholar 

  13. Gábor, Z., Kalmár, Z., Szepesvári, C.: Multi-criteria reinforcement learning. In: ICML, vol. 98, pp. 197–205 (1998)

    Google Scholar 

  14. Glorennec, P.Y.: Fuzzy q-learning and evolutionary strategy for adaptive fuzzy control. EUFIT 94(1521), 35–40 (1994)

    Google Scholar 

  15. Google: Google self-driving car project. Accessed 20 Apr 2016

    Google Scholar 

  16. Harutyunyan, A., Devlin, S., Vrancx, P., Nowé, A.: Expressing arbitrary reward functions as potential-based advice. In: Proceedings of the Twenty-Ninth AAAI Conference on Artificial Intelligence (2015)

    Google Scholar 

  17. Klopf, A.H.: Brain function, adaptive systems: a heterostatic theory. Technical report AFCRL-72-0164, Air Force Cambridge Research Laboratories, Bedford, MA (1972)

    Google Scholar 

  18. Knox, W.B., Stone, P.: Combining manual feedback with subsequent MDP reward signals for reinforcement learning. In: Proceedings of the 9th International Conference on Autonomous Agents and Multiagent Systems, pp. 5–12 (2010)

    Google Scholar 

  19. Lizotte, D.J., Bowling, M.H., Murphy, S.A.: Efficient reinforcement learning with multiple reward functions for randomized controlled trial analysis. In: Proceedings of the 27th International Conference on Machine Learning (ICML-2010), pp. 695–702 (2010)

    Google Scholar 

  20. Mnih, V., Kavukcuoglu, K., Silver, D., Rusu, A.A., Veness, J., Bellemare, M.G., Graves, A., Riedmiller, M., Fidjeland, A.K., Ostrovski, G., et al.: Human-level control through deep reinforcement learning. Nature 518(7540), 529–533 (2015)

    Article  Google Scholar 

  21. Ng, A.Y., Harada, D., Russell, S.: Policy invariance under reward transformations: theory and application to reward shaping. In: Proceedings of the Sixteenth International Conference on Machine Learning, vol. 99, pp. 278–287 (1999)

    Google Scholar 

  22. Nowé, A.: Fuzzy reinforcement learning: an overview. In: Advances in Fuzzy Theory and Technology (1995)

    Google Scholar 

  23. Nowé, A., Vrancx, P., De Hauwere, Y.-M.: Game theory and multi-agent reinforcement learning. In: Wiering, M., van Otterlo, M. (eds.) Reinforcement Learning. ALO, vol. 12, pp. 441–470. Springer, Heidelberg (2012)

    Chapter  Google Scholar 

  24. Roijers, D.M., Vamplew, P., Whiteson, S., Dazeley, R.: A survey of multi-objective sequential decision-making. J. Artif. Intell. Res. 48, 67–113 (2013)

    MathSciNet  MATH  Google Scholar 

  25. Russell, S., Norvig, P.: Artificial Intelligence: A Modern Approach. Artificial Intelligence, vol. 25, p. 27. Prentice-Hall, Egnlewood Cliffs (1995)

    Google Scholar 

  26. Sehnke, F., Graves, A., Osendorfer, C., Schmidhuber, J.: Multimodal parameter-exploring policy gradients. In: Ninth International Conference on Machine Learning and Applications (ICMLA), pp. 113–118. IEEE (2010)

    Google Scholar 

  27. Sehnke, F., Osendorfer, C., Rückstieß, T., Graves, A., Peters, J., Schmidhuber, J.: Parameter-exploring policy gradients. Neural Netw. 23(4), 551–559 (2010)

    Article  Google Scholar 

  28. Silver, D., Huang, A., Maddison, C.J., Guez, A., Sifre, L., van den Driessche, G., Schrittwieser, J., Antonoglou, I., Panneershelvam, V., Lanctot, M., et al.: Mastering the game of go with deep neural networks and tree search. Nature 529(7587), 484–489 (2016)

    Article  Google Scholar 

  29. Singh, S., Jaakkola, T., Littman, M.L., Szepesvári, C.: Convergence results for single-step on-policy reinforcement-learning algorithms. Mach. Learn. 38(3), 287–308 (2000)

    Article  MATH  Google Scholar 

  30. Singh, S.P., Sutton, R.S.: Reinforcement learning with replacing eligibility traces. Mach. Learn. 22(1–3), 123–158 (1996)

    MATH  Google Scholar 

  31. Skinner, B.F.: The Behavior of Organisms: An Experimental Analysis. Appleton-Century, New York (1938)

    Google Scholar 

  32. Sutton, R.: The future of AI (2006). https://www.youtube.com/watch?v=pD-FWetbvN8. Accessed 28 June 2016

  33. Sutton, R., Barto, A.: Reinforcement Learning: An Introduction, vol. 1. Cambridge University Press, Cambridge (1998)

    Google Scholar 

  34. Sutton, R.S., McAllester, D.A., Singh, S.P., Mansour, Y., et al.: Policy gradient methods for reinforcement learning with function approximation. In: NIPS, vol. 99, pp. 1057–1063 (1999)

    Google Scholar 

  35. Taylor, M.E.: Autonomous Inter-Task Transfer in Reinforcement Learning Domains. ProQuest, Ann Arbor (2008)

    Google Scholar 

  36. Taylor, M.E., Stone, P.: Transfer learning for reinforcement learning domains: a survey. J. Mach. Learn. Res. 10, 1633–1685 (2009)

    MathSciNet  MATH  Google Scholar 

  37. Tsitsiklis, J.N.: Asynchronous stochastic approximation and Q-learning. Mach. Learn. 16(3), 185–202 (1994)

    MathSciNet  MATH  Google Scholar 

  38. Vamplew, P., Dazeley, R., Berry, A., Issabekov, R., Dekker, E.: Empirical evaluation methods for multiobjective reinforcement learning algorithms. Mach. Learn. 84(1–2), 51–80 (2010)

    MathSciNet  Google Scholar 

  39. Vamplew, P., Yearwood, J., Dazeley, R., Berry, A.: On the limitations of scalarisation for multi-objective reinforcement learning of Pareto fronts. In: Wobcke, W., Zhang, M. (eds.) AI 2008. LNCS (LNAI), vol. 5360, pp. 372–378. Springer, Heidelberg (2008)

    Chapter  Google Scholar 

  40. Van Moffaert, K.: Multi-criteria reinforcement learning for sequential decision making problems. Ph.D. thesis, Vrije Universiteit Brussel (2016)

    Google Scholar 

  41. Van Moffaert, K., Drugan, M.M., Nowé, A.: Scalarized multi-objective reinforcement learning: novel design techniques. In: IEEE International Symposium on Approximate Dynamic Programming and Reinforcement Learning. IEEE (2013)

    Google Scholar 

  42. Van Moffaert, K., Nowé, A.: Multi-objective reinforcement learning using sets of pareto dominating policies. J. Mach. Learn. Res. 15(1), 3483–3512 (2014)

    MathSciNet  MATH  Google Scholar 

  43. Wang, W., Sebag, M., et al.: Multi-objective monte-carlo tree search. In: ACML, pp. 507–522 (2012)

    Google Scholar 

  44. Watkins, C.J.C.H.: Learning from delayed rewards. Ph.D. thesis, University of Cambridge (1989)

    Google Scholar 

  45. Wiering, M., Otterlo, M.: Reinforcement Learning: State-of-the-Art (Adaptation, Learning, and Optimization). Springer, Berlin (2012)

    Book  Google Scholar 

  46. Wiewiora, E., Cottrell, G., Elkan, C.: Principled methods for advising reinforcement learning agents. In: International Conference on Machine Learning, pp. 792–799 (2003)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ann Nowé .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2016 Springer International Publishing Switzerland

About this paper

Cite this paper

Nowé, A., Brys, T. (2016). A Gentle Introduction to Reinforcement Learning. In: Schockaert, S., Senellart, P. (eds) Scalable Uncertainty Management. SUM 2016. Lecture Notes in Computer Science(), vol 9858. Springer, Cham. https://doi.org/10.1007/978-3-319-45856-4_2

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-45856-4_2

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-45855-7

  • Online ISBN: 978-3-319-45856-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics