B-Learning: A reinforcement learning algorithm, comparison with dynamic programming

Langlols, Thibault; Canu, Stéphane

doi:10.1007/3-540-56798-4_157

B-Learning: A reinforcement learning algorithm, comparison with dynamic programming

Thibault Langlols^1,2 &
Stéphane Canu²

Conference paper
First Online: 01 January 2005

312 Accesses

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 686))

Abstract

In this paper we present a Reinforcement Learning method — B-Learning — for the control of a water production plant. A comparison between B-Learning and Dynamic Programming is provided from both theoretical and performance points of view. It is shown that Reinforcement-based neural control can lead to results comparable in quality to Dynamic Programming-based though less computationnally expensive.

The authors are grateful to Patrick Lesueur who performed the experiments using Dynamic Programming. ¹r(t) is equal to 0 if the system is in an allowed state and equal to −1 if not.

This is a preview of subscription content, log in via an institution.

Preview

Unable to display preview. Download preview PDF.

References

Charles W. Anderson. Learning to control an inverted pendulum using neural networks. IEEE Control Magazine., pages 31–37, april 1989.
Google Scholar
Andrew G. Barto, Steven J. Bradtke, and Satinder P. Singh. Real-time learning and control using asynchronous dynamic programming. Technical Report 91-57, University of Massachusetts, Dept of Computer Science, Amherst MA 01003, August 1991.
Google Scholar
Andrew G. Barto, Richard S. Sutton, and Charles W. Anderson. Neuronlike adaptive elements that can solve difficult learning problems. IEEE Transactions on Systems, Man and Cybernetics, SMC-13(5):834–846, September October 1983.
Google Scholar
J. C. Hoskins and D. M. Himelblau. Process control via incremental neural networks and reinforcement learning. In Chicago Meeting of American Institute of Chemical Engineers, Chicago Illinois, November 1990.
Google Scholar
Wayne C. Jouse and John G. Williams. The control of nuclear reactor start-up using drive reinforcement theory. In Cihan H. Dagli, Soundar R. T. Kumara, and Yung C. Shin, editors, Intelligent Engineering Systems Through Artificial Neural Networks, pages 537–544, St. Louis, Missouri, USA, November 1991.
Google Scholar
R. Kora, P. Lesueur, and P. Villon. An adaptive optimal control algorithm for water treatment plants. In to be published, 1992.
Google Scholar
Thibault Langlois and Stéphane Canu. B-learning: a reinforcement learning variant for the control of a plant. In Intelligent Engineering Systems Through Artificial Neural Networks (ANNIE'92). ASME Press, 1992.
Google Scholar
Thibault Langlois and Stéphane Canu. Control of time-delay systems using reinforcement learning. In Artificial Neural Networks, 2. Elsevier Science Publishers, 1992.
Google Scholar
Long-Ji Lin. Programming robots using reinforcement learning and teaching. In NinthNational Conference on Artificial Intelligence, pages 781–786, 1991.
Google Scholar
Long-Ji Lin. Self-improving reactive agents based on reinforcement learning, planning and teaching. Machine-Learning, 8(3–4):923–321, 1992.
Google Scholar
Richard S. Sutton. Learning to predict by the method of temporal differences. Machine learning, 3:9–44, 1988.
Google Scholar
Richard S. Sutton. Integrated modeling control based on reinforcement learning and dynamic programming, hi Richard P. Lippman, John E. Moody, and David S. Touretzky, editors, Advances in Neural Information Processing Systems, volume 3, pages 471–478. Morgan Kaufmann, 1990.
Google Scholar
Christopher J. C. H. Watkins. Learning with Delayed Rewards. PhD thesis, Cambridge University Psychology Department, 1989.
Google Scholar

Download references

Author information

Authors and Affiliations

INESC, rua Alves Redol, 9 Apartado 10105, 1017, Lisboa Codex, Portugal
Thibault Langlols
Université de Technologie de Compiègne, U.R.A. 817 du CNRS BP649, F-60206, Compiègne Cedex, France
Thibault Langlols & Stéphane Canu

Authors

Thibault Langlols
View author publications
You can also search for this author in PubMed Google Scholar
Stéphane Canu
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

José Mira Joan Cabestany Alberto Prieto

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Langlols, T., Canu, S. (1993). B-Learning: A reinforcement learning algorithm, comparison with dynamic programming. In: Mira, J., Cabestany, J., Prieto, A. (eds) New Trends in Neural Computation. IWANN 1993. Lecture Notes in Computer Science, vol 686. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-56798-4_157

Download citation

DOI: https://doi.org/10.1007/3-540-56798-4_157
Published: 01 June 2005
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-56798-1
Online ISBN: 978-3-540-47741-9
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics