Skip to main content

Optimal Tuning of Continual Online Exploration in Reinforcement Learning

  • Conference paper
Artificial Neural Networks – ICANN 2006 (ICANN 2006)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 4131))

Included in the following conference series:

Abstract

This paper presents a framework allowing to tune continual exploration in an optimal way. It first quantifies the rate of exploration by defining the degree of exploration of a state as the probability-distribution entropy for choosing an admissible action. Then, the exploration/exploitation tradeoff is stated as a global optimization problem: find the exploration strategy that minimizes the expected cumulated cost, while maintaining fixed degrees of exploration at same nodes. In other words, “exploitation” is maximized for constant “exploration”. This formulation leads to a set of nonlinear updating rules reminiscent of the value-iteration algorithm. Convergence of these rules to a local minimum can be proved for a stationary environment. Interestingly, in the deterministic case, when there is no exploration, these equations reduce to the Bellman equations for finding the shortest path while, when it is maximum, a full “blind” exploration is performed.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Similar content being viewed by others

References

  1. Achbany, Y., Fouss, F., Yen, L., Pirotte, A., Saerens, M.: Tuning continual exploration in reinforcement learning. Technical report (2005), http://www.isys.ucl.ac.be/staff/francois/Articles/Achbany2005a.pdf

  2. Bazaraa, M.S., Sherali, H.D., Shetty, C.M.: Nonlinear programming: Theory and algorithms. John Wiley and Sons, Chichester (1993)

    MATH  Google Scholar 

  3. Bertsekas, D.P.: Neuro-dynamic programming. Athena Scientific, Belmont (1996)

    MATH  Google Scholar 

  4. Bertsekas, D.P.: Network optimization: continuous and discrete models. Athena Scientific, Belmont (1998)

    MATH  Google Scholar 

  5. Bertsekas, D.P.: Dynamic programming and optimal control. Athena sientific, Belmont (2000)

    Google Scholar 

  6. Boyan, J.A., Littman, M.L.: Packet routing in dynamically changing networks: A reinforcement learning approach. In: Advances in Neural Information Processing Systems 6 (NIPS6), pp. 671–678 (1994)

    Google Scholar 

  7. Brown, R.G.: Smoothing, forecasting and prediction of discrete time series. Prentice-Hall, Englewood Cliffs (1962)

    Google Scholar 

  8. Christofides, N.: Graph theory: An algorithmic approach. Academic Press, London (1975)

    MATH  Google Scholar 

  9. Cover, T.M., Thomas, J.A.: Elements of information theory. John Wiley and Sons, Chichester (1991)

    Book  MATH  Google Scholar 

  10. Kapur, J.N., Kesavan, H.K.: Entropy optimization principles with applications. Academic Press, London (1992)

    Google Scholar 

  11. Kemeny, J.G., Snell, J.L.: Finite markov chains. Springer, Heidelberg (1976)

    MATH  Google Scholar 

  12. Osborne, M.J.: An introduction to game theory. Oxford University Press, Oxford (2004)

    Google Scholar 

  13. Raiffa, H.: Decision analysis. Addison-Wesley, Reading (1970)

    Google Scholar 

  14. Rummery, G., Niranjan, M.: On-line q-learning using connectionist systems. Technical Report CUED/F-INFENG/TR 166, Cambridge University Engineering Departement (1994)

    Google Scholar 

  15. Shani, G., Brafman, R., Shimony, S.: Adaptation for changing stochastic environments through online pomdp policy learning. In: Gama, J., Camacho, R., Brazdil, P.B., Jorge, A.M., Torgo, L. (eds.) ECML 2005. LNCS (LNAI), vol. 3720, pp. 353–364. Springer, Heidelberg (2005)

    Chapter  Google Scholar 

  16. Singh, S., Sutton, R.: Reinforcement learning with replacing eligibility traces. Machine Learning 22, 123–158 (1996)

    MATH  Google Scholar 

  17. Spall, J.C.: Introduction to stochastic search and optimization. Wiley, Chichester (2003)

    Book  MATH  Google Scholar 

  18. Sutton, R.S., Barto, A.G.: Reinforcement learning: an introduction. The MIT Press, Cambridge (1998)

    Google Scholar 

  19. Thrun, S.: Efficient exploration in reinforcement learning. Technical report, School of Computer Science, Carnegie Mellon University (1992)

    Google Scholar 

  20. Thrun, S.: The role of exploration in learning control. In: White, D., Sofge, D. (eds.) Handbook for Intelligent Control: Neural, Fuzzy and Adaptive Approaches, Van Nostrand Reinhold, Florence, Kentucky 41022 (1992)

    Google Scholar 

  21. Thrun, S., Burgard, W., Fox, D.: Probabilistic Robotics. MIT Press, Cambridge (2005)

    MATH  Google Scholar 

  22. Verbeeck, K.: Coordinated exploration in multi-agent reinforcement learning. PhD thesis, Vrije Universiteit Brussel, Belgium (2004)

    Google Scholar 

  23. Watkins, J.C.: Learning from delayed rewards. PhD thesis, King’s College of Cambridge, UK (1989)

    Google Scholar 

  24. Watkins, J.C., Dayan, P.: Q-learning. Machine Learning 8(3-4), 279–292 (1992)

    Article  MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2006 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Achbany, Y., Fouss, F., Yen, L., Pirotte, A., Saerens, M. (2006). Optimal Tuning of Continual Online Exploration in Reinforcement Learning. In: Kollias, S.D., Stafylopatis, A., Duch, W., Oja, E. (eds) Artificial Neural Networks – ICANN 2006. ICANN 2006. Lecture Notes in Computer Science, vol 4131. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11840817_82

Download citation

  • DOI: https://doi.org/10.1007/11840817_82

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-38625-4

  • Online ISBN: 978-3-540-38627-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics