A Gentle Introduction to Reinforcement Learning

Nowé, Ann; Brys, Tim

doi:10.1007/978-3-319-45856-4_2

Ann Nowé¹⁵ &
Tim Brys¹⁵

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 9858))

Included in the following conference series:

International Conference on Scalable Uncertainty Management

1413 Accesses
9 Citations

Abstract

This paper provides a gentle introduction to some of the basics of reinforcement learning, as well as pointers to more advanced topics within the field.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
Do not dismiss these results as only academically interesting due to the ‘game’ nature of the problems: the complexity of these problems approaches and surpasses that of many more useful applications [28]. Furthermore, the past has shown that breakthrough advances in games have lead to breakthroughs in other fields. Monte Carlo Tree Search, initially developed for Go, is one example [8].
2.
There may be many, although their (action-)value functions will all be the same.
3.
Not to speak of continuous action spaces. That is not considered in this paper.
4.
Dyna-Q combines Q-learning with learning a transition model. This (approximate) model is then used generated simulated samples for the Q-learner. Real life sample and simulated samples can be arbitrarily inter-twined. This principled is also referred to as planning in an RL context.
5.
To appear, will be available online at ai.vub.ac.be.

References

Albus, J.S.: Brains, Behavior, and Robotics. Byte Books, Peterborough (1981)
Google Scholar
Amazon: Amazon prime air (2016). http://www.amazon.com/b?node=8037720011. Accessed 20 Apr 2016
Barrett, L., Narayanan, S.: Learning all optimal policies with multiple criteria. In: Proceedings of the 25th International Conference on Machine Learning, pp. 41–47. ACM (2008)
Google Scholar
Barto, A.G., Sutton, R.S., Anderson, C.W.: Neuronlike adaptive elements that can solve difficult learning control problems. IEEE Trans. Syst. Man Cybern. 5, 834–846 (1983)
Article Google Scholar
Bertsekas, D.P.: Dynamic Programming and Optimal Control, vol. 1. Athena Scientific, Belmont (1995)
MATH Google Scholar
Bhattacharya, R., Waymire, E.C.: A Basic Course in Probability Theory. Springer, New York (2007)
MATH Google Scholar
Bloembergen, D., Tuyls, K., Hennes, D., Kaisers, M.: Evolutionary dynamics of multi-agent learning: a survey. J. Artif. Intell. Res. 53, 659–697 (2015)
MathSciNet MATH Google Scholar
Browne, C.B., Powley, E., Whitehouse, D., Lucas, S.M., Cowling, P.I., Rohlfshagen, P., Tavener, S., Perez, D., Samothrakis, S., Colton, S.: A survey of monte carlo tree search methods. IEEE Trans. Comput. Intell. AI Games 4(1), 1–43 (2012)
Article Google Scholar
Brys, T., Harutyunyan, A., Suay, H.B., Chernova, S., Taylor, M.E., Nowé, A.: Reinforcement learning from demonstration through shaping. In: Proceedings of the International Joint Conference on Artificial Intelligence (IJCAI), pp. 3352–3358 (2015)
Google Scholar
Das, I., Dennis, J.E.: A closer look at drawbacks of minimizing weighted sums of objectives for Pareto set generation in multicriteria optimization problems. Struct. Optim. 14(1), 63–69 (1997)
Article Google Scholar
Devlin, S., Kudenko, D.: Dynamic potential-based reward shaping. In: Proceedings of the 11th International Conference on Autonomous Agents and Multiagent Systems, vol. 1, pp. 433–440. International Foundation for Autonomous Agents and Multiagent Systems (2012)
Google Scholar
Devlin, S., Kudenko, D., Grześ, M.: An empirical study of potential-based reward shaping and advice in complex, multi-agent systems. Adv. Complex Syst. 14(02), 251–278 (2011)
Article MathSciNet Google Scholar
Gábor, Z., Kalmár, Z., Szepesvári, C.: Multi-criteria reinforcement learning. In: ICML, vol. 98, pp. 197–205 (1998)
Google Scholar
Glorennec, P.Y.: Fuzzy q-learning and evolutionary strategy for adaptive fuzzy control. EUFIT 94(1521), 35–40 (1994)
Google Scholar
Google: Google self-driving car project. Accessed 20 Apr 2016
Google Scholar
Harutyunyan, A., Devlin, S., Vrancx, P., Nowé, A.: Expressing arbitrary reward functions as potential-based advice. In: Proceedings of the Twenty-Ninth AAAI Conference on Artificial Intelligence (2015)
Google Scholar
Klopf, A.H.: Brain function, adaptive systems: a heterostatic theory. Technical report AFCRL-72-0164, Air Force Cambridge Research Laboratories, Bedford, MA (1972)
Google Scholar
Knox, W.B., Stone, P.: Combining manual feedback with subsequent MDP reward signals for reinforcement learning. In: Proceedings of the 9th International Conference on Autonomous Agents and Multiagent Systems, pp. 5–12 (2010)
Google Scholar
Lizotte, D.J., Bowling, M.H., Murphy, S.A.: Efficient reinforcement learning with multiple reward functions for randomized controlled trial analysis. In: Proceedings of the 27th International Conference on Machine Learning (ICML-2010), pp. 695–702 (2010)
Google Scholar
Mnih, V., Kavukcuoglu, K., Silver, D., Rusu, A.A., Veness, J., Bellemare, M.G., Graves, A., Riedmiller, M., Fidjeland, A.K., Ostrovski, G., et al.: Human-level control through deep reinforcement learning. Nature 518(7540), 529–533 (2015)
Article Google Scholar
Ng, A.Y., Harada, D., Russell, S.: Policy invariance under reward transformations: theory and application to reward shaping. In: Proceedings of the Sixteenth International Conference on Machine Learning, vol. 99, pp. 278–287 (1999)
Google Scholar
Nowé, A.: Fuzzy reinforcement learning: an overview. In: Advances in Fuzzy Theory and Technology (1995)
Google Scholar
Nowé, A., Vrancx, P., De Hauwere, Y.-M.: Game theory and multi-agent reinforcement learning. In: Wiering, M., van Otterlo, M. (eds.) Reinforcement Learning. ALO, vol. 12, pp. 441–470. Springer, Heidelberg (2012)
Chapter Google Scholar
Roijers, D.M., Vamplew, P., Whiteson, S., Dazeley, R.: A survey of multi-objective sequential decision-making. J. Artif. Intell. Res. 48, 67–113 (2013)
MathSciNet MATH Google Scholar
Russell, S., Norvig, P.: Artificial Intelligence: A Modern Approach. Artificial Intelligence, vol. 25, p. 27. Prentice-Hall, Egnlewood Cliffs (1995)
Google Scholar
Sehnke, F., Graves, A., Osendorfer, C., Schmidhuber, J.: Multimodal parameter-exploring policy gradients. In: Ninth International Conference on Machine Learning and Applications (ICMLA), pp. 113–118. IEEE (2010)
Google Scholar
Sehnke, F., Osendorfer, C., Rückstieß, T., Graves, A., Peters, J., Schmidhuber, J.: Parameter-exploring policy gradients. Neural Netw. 23(4), 551–559 (2010)
Article Google Scholar
Silver, D., Huang, A., Maddison, C.J., Guez, A., Sifre, L., van den Driessche, G., Schrittwieser, J., Antonoglou, I., Panneershelvam, V., Lanctot, M., et al.: Mastering the game of go with deep neural networks and tree search. Nature 529(7587), 484–489 (2016)
Article Google Scholar
Singh, S., Jaakkola, T., Littman, M.L., Szepesvári, C.: Convergence results for single-step on-policy reinforcement-learning algorithms. Mach. Learn. 38(3), 287–308 (2000)
Article MATH Google Scholar
Singh, S.P., Sutton, R.S.: Reinforcement learning with replacing eligibility traces. Mach. Learn. 22(1–3), 123–158 (1996)
MATH Google Scholar
Skinner, B.F.: The Behavior of Organisms: An Experimental Analysis. Appleton-Century, New York (1938)
Google Scholar
Sutton, R.: The future of AI (2006). https://www.youtube.com/watch?v=pD-FWetbvN8. Accessed 28 June 2016
Sutton, R., Barto, A.: Reinforcement Learning: An Introduction, vol. 1. Cambridge University Press, Cambridge (1998)
Google Scholar
Sutton, R.S., McAllester, D.A., Singh, S.P., Mansour, Y., et al.: Policy gradient methods for reinforcement learning with function approximation. In: NIPS, vol. 99, pp. 1057–1063 (1999)
Google Scholar
Taylor, M.E.: Autonomous Inter-Task Transfer in Reinforcement Learning Domains. ProQuest, Ann Arbor (2008)
Google Scholar
Taylor, M.E., Stone, P.: Transfer learning for reinforcement learning domains: a survey. J. Mach. Learn. Res. 10, 1633–1685 (2009)
MathSciNet MATH Google Scholar
Tsitsiklis, J.N.: Asynchronous stochastic approximation and Q-learning. Mach. Learn. 16(3), 185–202 (1994)
MathSciNet MATH Google Scholar
Vamplew, P., Dazeley, R., Berry, A., Issabekov, R., Dekker, E.: Empirical evaluation methods for multiobjective reinforcement learning algorithms. Mach. Learn. 84(1–2), 51–80 (2010)
MathSciNet Google Scholar
Vamplew, P., Yearwood, J., Dazeley, R., Berry, A.: On the limitations of scalarisation for multi-objective reinforcement learning of Pareto fronts. In: Wobcke, W., Zhang, M. (eds.) AI 2008. LNCS (LNAI), vol. 5360, pp. 372–378. Springer, Heidelberg (2008)
Chapter Google Scholar
Van Moffaert, K.: Multi-criteria reinforcement learning for sequential decision making problems. Ph.D. thesis, Vrije Universiteit Brussel (2016)
Google Scholar
Van Moffaert, K., Drugan, M.M., Nowé, A.: Scalarized multi-objective reinforcement learning: novel design techniques. In: IEEE International Symposium on Approximate Dynamic Programming and Reinforcement Learning. IEEE (2013)
Google Scholar
Van Moffaert, K., Nowé, A.: Multi-objective reinforcement learning using sets of pareto dominating policies. J. Mach. Learn. Res. 15(1), 3483–3512 (2014)
MathSciNet MATH Google Scholar
Wang, W., Sebag, M., et al.: Multi-objective monte-carlo tree search. In: ACML, pp. 507–522 (2012)
Google Scholar
Watkins, C.J.C.H.: Learning from delayed rewards. Ph.D. thesis, University of Cambridge (1989)
Google Scholar
Wiering, M., Otterlo, M.: Reinforcement Learning: State-of-the-Art (Adaptation, Learning, and Optimization). Springer, Berlin (2012)
Book Google Scholar
Wiewiora, E., Cottrell, G., Elkan, C.: Principled methods for advising reinforcement learning agents. In: International Conference on Machine Learning, pp. 792–799 (2003)
Google Scholar

Download references

Author information

Authors and Affiliations

Vrije Universiteit Brussel, Pleinlaan 2, Brussels, Belgium
Ann Nowé & Tim Brys

Authors

Ann Nowé
View author publications
You can also search for this author in PubMed Google Scholar
Tim Brys
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Ann Nowé .

Editor information

Editors and Affiliations

Cardiff University , Cardiff, United Kingdom
Steven Schockaert
Télécom ParisTech , Paris, Paris, France
Pierre Senellart

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Nowé, A., Brys, T. (2016). A Gentle Introduction to Reinforcement Learning. In: Schockaert, S., Senellart, P. (eds) Scalable Uncertainty Management. SUM 2016. Lecture Notes in Computer Science(), vol 9858. Springer, Cham. https://doi.org/10.1007/978-3-319-45856-4_2

Download citation

DOI: https://doi.org/10.1007/978-3-319-45856-4_2
Published: 30 August 2016
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-45855-7
Online ISBN: 978-3-319-45856-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics