Optimal Tuning of Continual Online Exploration in Reinforcement Learning

Achbany, Youssef; Fouss, Francois; Yen, Luh; Pirotte, Alain; Saerens, Marco

doi:10.1007/11840817_82

Youssef Achbany²⁰,
Francois Fouss²⁰,
Luh Yen²⁰,
Alain Pirotte²⁰ &
…
Marco Saerens²⁰

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 4131))

Included in the following conference series:

International Conference on Artificial Neural Networks

3445 Accesses
7 Citations

Abstract

This paper presents a framework allowing to tune continual exploration in an optimal way. It first quantifies the rate of exploration by defining the degree of exploration of a state as the probability-distribution entropy for choosing an admissible action. Then, the exploration/exploitation tradeoff is stated as a global optimization problem: find the exploration strategy that minimizes the expected cumulated cost, while maintaining fixed degrees of exploration at same nodes. In other words, “exploitation” is maximized for constant “exploration”. This formulation leads to a set of nonlinear updating rules reminiscent of the value-iteration algorithm. Convergence of these rules to a local minimum can be proved for a stationary environment. Interestingly, in the deterministic case, when there is no exploration, these equations reduce to the Bellman equations for finding the shortest path while, when it is maximum, a full “blind” exploration is performed.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Taxonomy of Reinforcement Learning Algorithms

Adaptive exploration policy for exploration–exploitation tradeoff in continuous action control optimization

Article 10 August 2021

Generalized exploration in policy search

Article 13 July 2017

References

Achbany, Y., Fouss, F., Yen, L., Pirotte, A., Saerens, M.: Tuning continual exploration in reinforcement learning. Technical report (2005), http://www.isys.ucl.ac.be/staff/francois/Articles/Achbany2005a.pdf
Bazaraa, M.S., Sherali, H.D., Shetty, C.M.: Nonlinear programming: Theory and algorithms. John Wiley and Sons, Chichester (1993)
MATH Google Scholar
Bertsekas, D.P.: Neuro-dynamic programming. Athena Scientific, Belmont (1996)
MATH Google Scholar
Bertsekas, D.P.: Network optimization: continuous and discrete models. Athena Scientific, Belmont (1998)
MATH Google Scholar
Bertsekas, D.P.: Dynamic programming and optimal control. Athena sientific, Belmont (2000)
Google Scholar
Boyan, J.A., Littman, M.L.: Packet routing in dynamically changing networks: A reinforcement learning approach. In: Advances in Neural Information Processing Systems 6 (NIPS6), pp. 671–678 (1994)
Google Scholar
Brown, R.G.: Smoothing, forecasting and prediction of discrete time series. Prentice-Hall, Englewood Cliffs (1962)
Google Scholar
Christofides, N.: Graph theory: An algorithmic approach. Academic Press, London (1975)
MATH Google Scholar
Cover, T.M., Thomas, J.A.: Elements of information theory. John Wiley and Sons, Chichester (1991)
Book MATH Google Scholar
Kapur, J.N., Kesavan, H.K.: Entropy optimization principles with applications. Academic Press, London (1992)
Google Scholar
Kemeny, J.G., Snell, J.L.: Finite markov chains. Springer, Heidelberg (1976)
MATH Google Scholar
Osborne, M.J.: An introduction to game theory. Oxford University Press, Oxford (2004)
Google Scholar
Raiffa, H.: Decision analysis. Addison-Wesley, Reading (1970)
Google Scholar
Rummery, G., Niranjan, M.: On-line q-learning using connectionist systems. Technical Report CUED/F-INFENG/TR 166, Cambridge University Engineering Departement (1994)
Google Scholar
Shani, G., Brafman, R., Shimony, S.: Adaptation for changing stochastic environments through online pomdp policy learning. In: Gama, J., Camacho, R., Brazdil, P.B., Jorge, A.M., Torgo, L. (eds.) ECML 2005. LNCS (LNAI), vol. 3720, pp. 353–364. Springer, Heidelberg (2005)
Chapter Google Scholar
Singh, S., Sutton, R.: Reinforcement learning with replacing eligibility traces. Machine Learning 22, 123–158 (1996)
MATH Google Scholar
Spall, J.C.: Introduction to stochastic search and optimization. Wiley, Chichester (2003)
Book MATH Google Scholar
Sutton, R.S., Barto, A.G.: Reinforcement learning: an introduction. The MIT Press, Cambridge (1998)
Google Scholar
Thrun, S.: Efficient exploration in reinforcement learning. Technical report, School of Computer Science, Carnegie Mellon University (1992)
Google Scholar
Thrun, S.: The role of exploration in learning control. In: White, D., Sofge, D. (eds.) Handbook for Intelligent Control: Neural, Fuzzy and Adaptive Approaches, Van Nostrand Reinhold, Florence, Kentucky 41022 (1992)
Google Scholar
Thrun, S., Burgard, W., Fox, D.: Probabilistic Robotics. MIT Press, Cambridge (2005)
MATH Google Scholar
Verbeeck, K.: Coordinated exploration in multi-agent reinforcement learning. PhD thesis, Vrije Universiteit Brussel, Belgium (2004)
Google Scholar
Watkins, J.C.: Learning from delayed rewards. PhD thesis, King’s College of Cambridge, UK (1989)
Google Scholar
Watkins, J.C., Dayan, P.: Q-learning. Machine Learning 8(3-4), 279–292 (1992)
Article MATH Google Scholar

Download references

Author information

Authors and Affiliations

Information Systems Research Unit (ISYS), Place des Doyens 1, Université de Louvain, Belgium
Youssef Achbany, Francois Fouss, Luh Yen, Alain Pirotte & Marco Saerens

Authors

Youssef Achbany
View author publications
You can also search for this author in PubMed Google Scholar
Francois Fouss
View author publications
You can also search for this author in PubMed Google Scholar
Luh Yen
View author publications
You can also search for this author in PubMed Google Scholar
Alain Pirotte
View author publications
You can also search for this author in PubMed Google Scholar
Marco Saerens
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

School of Electrical and Computer Engineering, Image, Video and Multimedia Systems Laboratory, National Technical University of Athens, GR-157 80, Zographou, Greece
Stefanos D. Kollias
Department of Electrical and Computer Engineering, National Technical University of Athens, 15780, Zographou, Greece
Andreas Stafylopatis
Department of Informatics, Nicolaus Copernicus University, Toruń, Poland
Włodzisław Duch
Adaptive Informatics Research Centre, Helsinki University of Technology, HUT, P.O. Box 5400, 02015, Finland
Erkki Oja

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Achbany, Y., Fouss, F., Yen, L., Pirotte, A., Saerens, M. (2006). Optimal Tuning of Continual Online Exploration in Reinforcement Learning. In: Kollias, S.D., Stafylopatis, A., Duch, W., Oja, E. (eds) Artificial Neural Networks – ICANN 2006. ICANN 2006. Lecture Notes in Computer Science, vol 4131. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11840817_82

Download citation

DOI: https://doi.org/10.1007/11840817_82
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-38625-4
Online ISBN: 978-3-540-38627-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Optimal Tuning of Continual Online Exploration in Reinforcement Learning

Abstract

Access this chapter

Preview

Similar content being viewed by others

Taxonomy of Reinforcement Learning Algorithms

Adaptive exploration policy for exploration–exploitation tradeoff in continuous action control optimization

Generalized exploration in policy search

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

Optimal Tuning of Continual Online Exploration in Reinforcement Learning

Abstract

Access this chapter

Preview

Similar content being viewed by others

Taxonomy of Reinforcement Learning Algorithms

Adaptive exploration policy for exploration–exploitation tradeoff in continuous action control optimization

Generalized exploration in policy search

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation