Skip to main content

B-Learning: A reinforcement learning algorithm, comparison with dynamic programming

  • Conference paper
  • First Online:
  • 312 Accesses

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 686))

Abstract

In this paper we present a Reinforcement Learning method — B-Learning — for the control of a water production plant. A comparison between B-Learning and Dynamic Programming is provided from both theoretical and performance points of view. It is shown that Reinforcement-based neural control can lead to results comparable in quality to Dynamic Programming-based though less computationnally expensive.

The authors are grateful to Patrick Lesueur who performed the experiments using Dynamic Programming. 1r(t) is equal to 0 if the system is in an allowed state and equal to −1 if not.

This is a preview of subscription content, log in via an institution.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Charles W. Anderson. Learning to control an inverted pendulum using neural networks. IEEE Control Magazine., pages 31–37, april 1989.

    Google Scholar 

  2. Andrew G. Barto, Steven J. Bradtke, and Satinder P. Singh. Real-time learning and control using asynchronous dynamic programming. Technical Report 91-57, University of Massachusetts, Dept of Computer Science, Amherst MA 01003, August 1991.

    Google Scholar 

  3. Andrew G. Barto, Richard S. Sutton, and Charles W. Anderson. Neuronlike adaptive elements that can solve difficult learning problems. IEEE Transactions on Systems, Man and Cybernetics, SMC-13(5):834–846, September October 1983.

    Google Scholar 

  4. J. C. Hoskins and D. M. Himelblau. Process control via incremental neural networks and reinforcement learning. In Chicago Meeting of American Institute of Chemical Engineers, Chicago Illinois, November 1990.

    Google Scholar 

  5. Wayne C. Jouse and John G. Williams. The control of nuclear reactor start-up using drive reinforcement theory. In Cihan H. Dagli, Soundar R. T. Kumara, and Yung C. Shin, editors, Intelligent Engineering Systems Through Artificial Neural Networks, pages 537–544, St. Louis, Missouri, USA, November 1991.

    Google Scholar 

  6. R. Kora, P. Lesueur, and P. Villon. An adaptive optimal control algorithm for water treatment plants. In to be published, 1992.

    Google Scholar 

  7. Thibault Langlois and Stéphane Canu. B-learning: a reinforcement learning variant for the control of a plant. In Intelligent Engineering Systems Through Artificial Neural Networks (ANNIE'92). ASME Press, 1992.

    Google Scholar 

  8. Thibault Langlois and Stéphane Canu. Control of time-delay systems using reinforcement learning. In Artificial Neural Networks, 2. Elsevier Science Publishers, 1992.

    Google Scholar 

  9. Long-Ji Lin. Programming robots using reinforcement learning and teaching. In NinthNational Conference on Artificial Intelligence, pages 781–786, 1991.

    Google Scholar 

  10. Long-Ji Lin. Self-improving reactive agents based on reinforcement learning, planning and teaching. Machine-Learning, 8(3–4):923–321, 1992.

    Google Scholar 

  11. Richard S. Sutton. Learning to predict by the method of temporal differences. Machine learning, 3:9–44, 1988.

    Google Scholar 

  12. Richard S. Sutton. Integrated modeling control based on reinforcement learning and dynamic programming, hi Richard P. Lippman, John E. Moody, and David S. Touretzky, editors, Advances in Neural Information Processing Systems, volume 3, pages 471–478. Morgan Kaufmann, 1990.

    Google Scholar 

  13. Christopher J. C. H. Watkins. Learning with Delayed Rewards. PhD thesis, Cambridge University Psychology Department, 1989.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

José Mira Joan Cabestany Alberto Prieto

Rights and permissions

Reprints and permissions

Copyright information

© 1993 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Langlols, T., Canu, S. (1993). B-Learning: A reinforcement learning algorithm, comparison with dynamic programming. In: Mira, J., Cabestany, J., Prieto, A. (eds) New Trends in Neural Computation. IWANN 1993. Lecture Notes in Computer Science, vol 686. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-56798-4_157

Download citation

  • DOI: https://doi.org/10.1007/3-540-56798-4_157

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-56798-1

  • Online ISBN: 978-3-540-47741-9

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics