Skip to main content
Log in

Embedding a Priori Knowledge in Reinforcement Learning

  • Published:
Journal of Intelligent and Robotic Systems Aims and scope Submit manuscript

Abstract

In the last years, temporal differences methods have been put forward as convenient tools for reinforcement learning. Techniques based on temporal differences, however, suffer from a serious drawback: as stochastic adaptive algorithms, they may need extensive exploration of the state-action space before convergence is achieved. Although the basic methods are now reasonably well understood, it is precisely the structural simplicity of the reinforcement learning principle – learning through experimentation – that causes these excessive demands on the learning agent. Additionally, one must consider that the agent is very rarely a tabula rasa: some rough knowledge about characteristics of the surrounding environment is often available. In this paper, I present methods for embedding a priori knowledge in a reinforcement learning technique in such a way that both the mathematical structure of the basic learning algorithm and the capacity to generalise experience across the state-action space are kept. Extensive experimental results show that the resulting variants may lead to good performance, provided a sensible balance between risky use of prior imprecise knowledge and cautious use of learning experience is adopted.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  • Benveniste A., Métivier M., and Priouret P.: 1990, Adaptive Algorithms and Stochastic Approximations, Springer.

  • Bertsekas D. P.: 1987, Dynamic Programming: Deterministic and Stochastic Models, Prentice-Hall.

  • Boyan J. A. and Moore A. W.: 1995, Generalization in reinforcement learning: Safely approximating the value function, in: G. Tesauro, D. S. Touretzky, and T. K. Leen (eds), Advances in Neural Information Processing Systems Vol. 7, MIT Press.

  • Chapman D. and Kaelbling L. P.: 1991, Input generalization in delayed reinforcement learning: An algorithm and performance comparisons, in: Proc. of the International Joint Conf. on Artificial Intelligence (IJCAI'91), pp. 726–731.

  • Franke R.: 1982, Scattered data interpolation: Tests of some methods. Mathematics of Computation 38(157), 181–200.

    Google Scholar 

  • Jaakola T., Jordan M. I., and Singh S. P.: 1994, On the convergence of stochastic iterative dynamic programming algorithms, Neural Computation 6(6), 1185–1201.

    Google Scholar 

  • Karlin S. and Taylor H. M.: 1975, A First Course in Stochastic Processes, Academic Press.

  • Lin L.-Ji: 1992, Self-improving reactive agents based on reinforcement learning, planning and teaching, Machine Learning 8, 293–321.

    Google Scholar 

  • Littman M. L. and Szepesvári C.: 1996, A generalized reinforcement learning model: Convergence and applications, in: Procs. of the Thirteenth International Conf. on Machine Learning (ICML'96), pp. 310–318.

  • Mahadevan S. and Connell J.: 1992, Automatic programming of behavior-based robots using reinforcement learning, Artificial Intelligence 55, 311–365.

    Google Scholar 

  • Mahadevan S. and Kaelbling L. P.: 1996, The NSF workshop on reinforcement learning: Summary and observations, AI Magazine, in press.

  • Ribeiro C. H. C. and Szepesvári C.: 1996, Q-Learning combined with spreading: Convergence and results, in: Procs. of the ISRF-IEE International Conf. on Intelligent and Cognitive Systems (Neural Networks Symposium), pp. 32–36.

  • Ribeiro C. H. C.: 1995, Attentional mechanisms as a strategy for generalisation in the Q-learning algorithm, in: F. Fogelman-Soulié and P. Gallinari (eds), Procs. of the International Conf. on Artificial Neural Networks (ICANN'95), Vol. 1, EC2 et Cie, pp. 455–460.

  • Shepard D.: 1968, A two-dimensional interpolation function for irregularly spaced data, in: Procs. of the 23th National Conf. ACM, pp. 517–523.

  • Sutton R. S.: 1996, Generalization in reinforcement learning: Succesful examples using sparse coarse coding, in: D. S. Touretzky, M. C. Mozer, and M. E. Hasselmo (eds), Advances in Neural Information Processing Systems Vol. 8, MIT Press, pp. 1038–1044.

  • Szepesvári C. and Littman M. L.: 1996, Generalized Markov decision processes: Dynamic programming and reinforcement-learning algorithms, Cs–96–11, Brown University, Department of Computer Science, Providence.

  • Tesauro G.: 1992, Practical issues in temporal difference learning, Machine: Learning 8, 257–277.

    Google Scholar 

  • Tham C. K.: 1994, Modular on-line function approximation for scaling up reinforcement learning, PhD thesis, University of Cambridge.

  • Tsitsiklis J. N. and Van Roy B.: 1996, Feature-based methods for large scale dynamic programming, Machine Learning 22, 59–94.

    Google Scholar 

  • Tsitsiklis J. N. and Van Roy B.: 1997, An analysis of temporal-difference learning with function approximation, IEEE Transactions on Automatic Control, in press.

  • Watkins C. J. C. H.: 1989, Learning from delayed rewards, PhD thesis, University of Cambridge. JINT1392.tex; 18/11/1997; 12:31; v.7; p.21

Download references

Author information

Authors and Affiliations

Authors

Rights and permissions

Reprints and permissions

About this article

Cite this article

Ribeiro, C.H.C. Embedding a Priori Knowledge in Reinforcement Learning. Journal of Intelligent and Robotic Systems 21, 51–71 (1998). https://doi.org/10.1023/A:1007968115863

Download citation

  • Issue Date:

  • DOI: https://doi.org/10.1023/A:1007968115863

Navigation