Embedding a Priori Knowledge in Reinforcement Learning

Ribeiro, Carlos H. C.

doi:10.1023/A:1007968115863

Embedding a Priori Knowledge in Reinforcement Learning

Published: January 1998

Volume 21, pages 51–71, (1998)
Cite this article

Journal of Intelligent and Robotic Systems Aims and scope Submit manuscript

Carlos H. C. Ribeiro¹

305 Accesses
13 Citations
Explore all metrics

Abstract

In the last years, temporal differences methods have been put forward as convenient tools for reinforcement learning. Techniques based on temporal differences, however, suffer from a serious drawback: as stochastic adaptive algorithms, they may need extensive exploration of the state-action space before convergence is achieved. Although the basic methods are now reasonably well understood, it is precisely the structural simplicity of the reinforcement learning principle – learning through experimentation – that causes these excessive demands on the learning agent. Additionally, one must consider that the agent is very rarely a tabula rasa: some rough knowledge about characteristics of the surrounding environment is often available. In this paper, I present methods for embedding a priori knowledge in a reinforcement learning technique in such a way that both the mathematical structure of the basic learning algorithm and the capacity to generalise experience across the state-action space are kept. Extensive experimental results show that the resulting variants may lead to good performance, provided a sensible balance between risky use of prior imprecise knowledge and cautious use of learning experience is adopted.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

References

Benveniste A., Métivier M., and Priouret P.: 1990, Adaptive Algorithms and Stochastic Approximations, Springer.
Bertsekas D. P.: 1987, Dynamic Programming: Deterministic and Stochastic Models, Prentice-Hall.
Boyan J. A. and Moore A. W.: 1995, Generalization in reinforcement learning: Safely approximating the value function, in: G. Tesauro, D. S. Touretzky, and T. K. Leen (eds), Advances in Neural Information Processing Systems Vol. 7, MIT Press.
Chapman D. and Kaelbling L. P.: 1991, Input generalization in delayed reinforcement learning: An algorithm and performance comparisons, in: Proc. of the International Joint Conf. on Artificial Intelligence (IJCAI'91), pp. 726–731.
Franke R.: 1982, Scattered data interpolation: Tests of some methods. Mathematics of Computation 38(157), 181–200.
Google Scholar
Jaakola T., Jordan M. I., and Singh S. P.: 1994, On the convergence of stochastic iterative dynamic programming algorithms, Neural Computation 6(6), 1185–1201.
Google Scholar
Karlin S. and Taylor H. M.: 1975, A First Course in Stochastic Processes, Academic Press.
Lin L.-Ji: 1992, Self-improving reactive agents based on reinforcement learning, planning and teaching, Machine Learning 8, 293–321.
Google Scholar
Littman M. L. and Szepesvári C.: 1996, A generalized reinforcement learning model: Convergence and applications, in: Procs. of the Thirteenth International Conf. on Machine Learning (ICML'96), pp. 310–318.
Mahadevan S. and Connell J.: 1992, Automatic programming of behavior-based robots using reinforcement learning, Artificial Intelligence 55, 311–365.
Google Scholar
Mahadevan S. and Kaelbling L. P.: 1996, The NSF workshop on reinforcement learning: Summary and observations, AI Magazine, in press.
Ribeiro C. H. C. and Szepesvári C.: 1996, Q-Learning combined with spreading: Convergence and results, in: Procs. of the ISRF-IEE International Conf. on Intelligent and Cognitive Systems (Neural Networks Symposium), pp. 32–36.
Ribeiro C. H. C.: 1995, Attentional mechanisms as a strategy for generalisation in the Q-learning algorithm, in: F. Fogelman-Soulié and P. Gallinari (eds), Procs. of the International Conf. on Artificial Neural Networks (ICANN'95), Vol. 1, EC2 et Cie, pp. 455–460.
Shepard D.: 1968, A two-dimensional interpolation function for irregularly spaced data, in: Procs. of the 23th National Conf. ACM, pp. 517–523.
Sutton R. S.: 1996, Generalization in reinforcement learning: Succesful examples using sparse coarse coding, in: D. S. Touretzky, M. C. Mozer, and M. E. Hasselmo (eds), Advances in Neural Information Processing Systems Vol. 8, MIT Press, pp. 1038–1044.
Szepesvári C. and Littman M. L.: 1996, Generalized Markov decision processes: Dynamic programming and reinforcement-learning algorithms, Cs–96–11, Brown University, Department of Computer Science, Providence.
Tesauro G.: 1992, Practical issues in temporal difference learning, Machine: Learning 8, 257–277.
Google Scholar
Tham C. K.: 1994, Modular on-line function approximation for scaling up reinforcement learning, PhD thesis, University of Cambridge.
Tsitsiklis J. N. and Van Roy B.: 1996, Feature-based methods for large scale dynamic programming, Machine Learning 22, 59–94.
Google Scholar
Tsitsiklis J. N. and Van Roy B.: 1997, An analysis of temporal-difference learning with function approximation, IEEE Transactions on Automatic Control, in press.
Watkins C. J. C. H.: 1989, Learning from delayed rewards, PhD thesis, University of Cambridge. JINT1392.tex; 18/11/1997; 12:31; v.7; p.21

Download references

Author information

Authors and Affiliations

Dept. of Electrical and Electronic Engineering, Imperial College of Science, Technology and Medicine Exhibition Road, London, SW7 2BT, U.K.
Carlos H. C. Ribeiro

Authors

Carlos H. C. Ribeiro
View author publications
You can also search for this author in PubMed Google Scholar

Rights and permissions

Reprints and permissions

About this article

Cite this article

Ribeiro, C.H.C. Embedding a Priori Knowledge in Reinforcement Learning. Journal of Intelligent and Robotic Systems 21, 51–71 (1998). https://doi.org/10.1023/A:1007968115863

Download citation

Issue Date: January 1998
DOI: https://doi.org/10.1023/A:1007968115863

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Embedding a Priori Knowledge in Reinforcement Learning

Abstract

Access this article

Similar content being viewed by others

Towards Reinforcement Learning for Non-stationary Environments

Synchronisms Using Reinforcement Learning as an Heuristic

Contingent Features for Reinforcement Learning

References

Author information

Authors and Affiliations

Rights and permissions

About this article

Cite this article

Navigation

Embedding a Priori Knowledge in Reinforcement Learning

Abstract

Access this article

Similar content being viewed by others

Towards Reinforcement Learning for Non-stationary Environments

Synchronisms Using Reinforcement Learning as an Heuristic

Contingent Features for Reinforcement Learning

References

Author information

Authors and Affiliations

Rights and permissions

About this article

Cite this article

Share this article

Search

Navigation